Sensitivity And Out-Of-Sample Error in 
Continuous Time Data Assimilation 



Jochen B rocker and Ivan G. Szendro 
Max-Planck-Institut fiir Physik komplexer Systeme 
Nothnitzer Strasse 34 
01187 Dresden 
Germany 

September 5, 2011 

Abstract 

Data assimilation refers to the problem of finding trajectories of a 
prescribed dynamical model in such a way that the output of the model 
(usually some function of the model states) follows a given time series of 
observations. Typically though, these two requirements cannot both be 
met at the same time — tracking the observations is not possible without 
the trajectory deviating from the proposed model equations, while adher- 
ence to the model requires deviations from the observations. Thus, data 
assimilation faces a trade-off. In this contribution, the sensitivity of the 
data assimilation with respect to perturbations in the observations is iden- 
tified as the parameter which controls the trade-off. A relation between 
the sensitivity and the out-of-sample error is established which allows 
to calculate the latter under operational conditions. A minimum out-of- 
sample error is proposed as a criterion to set an appropriate sensitivity 
and to settle the discussed trade-off. Two approaches to data assimilation 
are considered, namely variational data assimilation and Newtonian nudg- 
ing, aka synchronisation. Numerical examples demonstrate the feasibility 
of the approach. 

1 Introduction 

Data Assimilation is one of several names for a problem (or class of problems) 
which in broad terms might be described thus: Suppose we are given a time 
series of observations, generated by some dynamical process. Further, we are 
given a dynamical model of the considered process, derived either from first 
principles, data analysis or other procedure. The task is to find orbits of the 
considered model which are consistent with the given time series of observations. 
We will restrict ourselves here to the calculation of individual orbits, but ideally, 
some form of distribution of orbits given the observations would be desired. 
Usually, the model evolves in some state space which is not identical to the space 
in which the observations live. Rather, it is assumed that there are unobserved 
degrees of freedom, and the trajectories are supposed to follow the observations 
only after some function has been applied. Clearly, data assimilation appears in 
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many branches of science and engineering in various different context and guises 
and in connection with different objectives. Reasons as to why data assimilation 
is performed might be assessment of the assimilating model, investigation of the 
underlying dynamics, and to obtain initial conditions for dynamical forecasts of 
future observations. 

Naturally, a large variety of different solutions to this problem have been pro- 
posed, which often differ in a number of important details. There are nonetheless 
issues which pertain to data assimilation in general and which are independent 
of the specifics of a particular algorithm. The fact that different communities 
have different names for data assimilation often obscures the similarities be- 
tween the approaches. A fundamental issue is that in all but the most fortunate 
situations, data assimilation will have to find a trade-off between two basic yet 
mutually incompatible objectives: Finding a trajectory which is close to the 
observations versus finding a trajectory which is close to being an orbit of the 
model. The commonplace that all models are wrong implies that indeed these 
two objectives cannot be reached at the same time and that the trade-off is a 
nontrivial one. In the present contribution, this trade-off of data assimilation 
will simply be referred to as the trade-off. 

In this paper, a general criterion is proposed and investigated by which this 
trade-off can be settled. Any such criterion obviously needs some justification, 
but typically a mathematical justification cannot be given without introducing 
additional principles or dicta which, by themselves, have no foundation other 
than appearing reasonable or providing useful results. Ideally, data assimilation 
should provide solutions which are close to the trajectories of some imaginary 
system which is thought of as having generated the observation. Unfortunately, 
this is not an operationally feasible approach, since such trajectories are not 
available, and if they were, there would be no need for data assimilation. Even 
then, the trajectories of the underlying true system (if it existed) and those of the 
model might not even be comparable. For instance, the respective state space 
dimensions of model and true system might be completely different, rendering 
a comparison impossible. 

Instead, a minimum out-of-sample error is proposed as a criterion to settle 
the aforementioned trade-off in data assimilation. This quantity is not unsim- 
ilar to the out-of-sample error concept from statistics. The observations are 
assumed to be corrupted by random (although not necessarily white) perturba- 
tions. If noise corrupted data is assimilated into the model, the result should 
be close to the observations, even if the latter were corrupted with a different 
realisation of the noise; in other words, the result should be close to hypothet- 
ical observations with independent errors. The out-of-sample error quantifies 
the extent to which this is the case. Alternatively, the out-of-sample error can 
be considered as the error with respect to the true (unperturbed) observations, 
plus a constant (the variance of the observational noise). 

What renders the out-of-sample error interesting from an application point 
of view is that it can be estimated from quantities which are, at least in prin- 
ciple, available in operational situations. More specifically, it is shown that 
the out-of-sample error is the sum of the tracking error and a quantity in- 
volving the so-called sensitivity. Sensitivity concepts have alr eady been stud- 



i ed in yarious publications on data assimilatio n, for example ICardinali et al 



(|2004l ); lLiu and Kalnavl (|2008l) : iLiu et al.l (|2009t ). for the special case of 3D-Var 
and 4D-Var in discrete time. In the present work, the concepts of sensitivity 
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and out-of-sample error will be considered in the context of data assimilation 
schemes in continuous time. The sensitivity is a gene ralisation of the tra ce of 
the hat matrix known from linear regression (see e.g. lHastie et all 120011) . In 



that context, the relation between sensitivity and out-of-sample error is well 
known as the C p -criterion. The ideas developed in the present paper, outlined 
in Section [21 can be considered as generalisations of these concepts. 

How the presented approach could be employed is explained in detail in 
the context of two approaches to data assimi lation. In Section EH a general 
variational data assimilation technique (cf e.g. Le Dimet and Talagrandl Il986t 



Courtier and TalagrandLll987tlDerberl . ll989HApte et all l2008Ujuddl . 120081: iBrocker . 
2010h is considered. This variational approach leads to two-point boundary 



value problems, which can be solved either sequentially in time or using the so- 
called collocation method. It is demonstrated how to compute the sensitivity 
in both cases with minimum additional costs by reusing quantities which have 
been calculated already during the data assimilation procedure proper. As a 
further e xamp le, data assimilation through synchronisation (aka nudging, see 
ISeamanl . [l990) is investigated in Section @] In this situation, the sensitivity is 
essentially given by the coupling parameter. Finally, numerical examples are 
studied in Section E3 demonstrating the feasibility of the approach. 



2 The out— of— sample performance of data as- 
similation 

2.1 A prelude 

The purpose of this subsection is to provide a smooth entry to those readers 
who are used to think in the context of three and four dimensional variational 
assimilation in discrete time. It might also serve as a reminder of a few basic 
facts from the theory of linear estimation which will be of use in later discussions. 
Some notation will be introduced on the way. 

Suppose we are making an observation ry, which we assume to be some 
quantity of interest £ plus some noise, that is, r\ = £ + r, where r, the noise, is 
random with mean Er = and variance Er 2 = s. An estimator is a function 
7/(77) j with the hope that 7/(77) = £. Obviously, this cannot be exactly true. A 
useful measure of estimator performance is the mean square error. The mean 
square error admits an instructive decomposition (the term on the left hand 
side is our definition of the mean square error): 

nc-vf = (Q-y) 2 + nv-yf (i) 

with y = £7/(77). The first and second terms in ([I]) are called bias and variance 
of y, respectively. If y = the estimator is called unbiased. 

An estimator is linear if 7/(77) — Hrj. (Here, H is but a scalar; in general, it 
is obviously a matrix.) For linear estimators, y — HC,; the variance of y is then 
^ [y — y] — H 2 s. A popular linear estimator is the least squares estimator, 
which obtains as the minimum of the function A(y) — (77 — y) 2 . In the present 
trivial case, this is 7/(77) = V- Clearly, this estimator is linear, unbiased, and has 
variance s. The Gauss-Markov theorem states that the least squares estimator 
has minimum variance among all unbiased linear estimators, a property often 
referred to as BLUE (best linear unbiased estimator). 
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It is possible though to find estimators which are biased but nonetheless 
sometimes have a smaller mean square error than BLUE. We said "sometimes" 
here because for a biased estimator, the mean square error depends also on the 
unknown £, and so does its performance compared to BLUE. The potential of 
biased estimators may be illustrated with the following example. Suppose we 
are not completely clueless as to the possible values of £, but we have a rough 
first guess r/Q. To employ this first guess, we determine our estimator as the 
minimum of the modified function 

A(y) = a-( V -y) 2 + (l-a)-(y- m ) 2 , (2) 

where a is a new parameter between zero and one. Our estimator becomes 

y(v) = a V + (1 -a)r) . (3) 

Clearly, this estimator is biased. Roughly speaking, a ought to represent our 
confidence in the quality of the observation r\ versus our confidence in the first 
guess rjQ] for a — > 1 (perfect confidence), we get y(rj) = rj, while a — > (no 
confidence) gives 2/(77) = 770- 
A simple calculation gives 

Mean Square Error = (1 - a) 2 (( - 770) 2 + a 2 s. 

It is easily seen that choosing 

«=J C ~f (4) 
(C " Vo) 2 + s 



yields a minimum error of 



< s. 



s ^ (C-no) 2 

Although the optimal a cannot be determined practically, since it involves the 
unknown quantity £, we see that there are potentially better estimators than 
BLUE. It thus seems reasonable to look for ways to determine the optimal a at 
least approximately. We will do this presently, but first note that our estimator 
could alternatively be interpreted as another instance of BLUE, namely, if 770 is 
regarded as a second observation of C- More precisely, suppose that 770 = C + r o> 
with Ero = and Erg = sq. This interpretation is often imposed in the data 
assimilation literature, where rjo is termed the background. It can be shown 
that the estimator in Equation ([3]) is the BLUE estimator of £, provided that 
= SSL . Indeed, the estimator has Mean Square Error = (1 — a) 2 SQ + a 2 s, 
and this expression is minimal for the said a. In other words, the estimator 
is BLUE only if in @, the two terms are weighted according to the respective 
error covariances of 77 and 770, which are often referred to as the observation 
and background error covariances, respectively. We get from this discussion that 
there are different ways to motivate the minimizer of the function in Equation (J2J) 
as an estimator; either as an estimator which is biased towards a first guess, or 
alternatively as an unbiased estimator where the first guess is treated as another 
observation. We will again encounter this situation in the case of variational 
data assimilation. 

Whatever the motivation for the estimator, there is the problem of having 
to determine a good a, ideally one that minimizes the mean square error. One 
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possible approach would be to use estimators of s and so in order to set a. 
Besides being a formidable problem on its own, it would force us to interprete 770 
as a random variable. An alternative route, pursued here, is to find a reasonable 
substitute for the mean square error, and subsequently approximate the optimal 
a by minimizing that substitute. We will employ the out-of-sample error as a 
substitute. Imagine we had another observation ?/' = C + with £ still the same 
but with r' being another realisation of the observational noise, that is Mr' = 0, 
Er' 2 = s, but Err' = 0, so r and r' are uncorrelated. With these definitions, we 
define the out-of-sample error as 

E m =E[rf - 2/(77)]' . 

By elementary manipulation 

E om =E[(-y( V )] 2 + s, (5) 

so that the out-of-sample error differs from the mean square error merely by a 
constant s. On the other hand, we have 

E aos = E [17 — y( V )} 2 + 2 Cov [77, 2/(77)] . (6) 

To see this, first note that for any random variables a, b, 

E [a - b} 2 = E [a - of + E [b - b] 2 + (a - b) 2 

- 2 Cov [a, b] . 

To prove Equation (J6j) , apply the previous relation to both E [77 — 2/(77)] 2 and 
E [77' — y(rj)\ , noting that 2/(77) is correlated with 77 but not with 77'. The quantity 
S = Cov [77, 2/(77)] /s will be referred to as the sensitivity from now on. The 
sensitivity should be understood as the correlation between r filtered through the 
estimator y(rf), and r itself. This correlation being large indicates that changes 
of the input will cause large changes of the output, while a low correlation 
indicates the opposite; hence the term "sensitivity" . 

We will now use the out-of-sample error to determine a for the specific 
estimator 2/(77) as defined in Equation ([3]). In the present situation, the second 
term in Equation © becomes 

2 Cov [77, 2/(77)] = 2as. 

So far, these have been exact calculations, but now we apply an approximation to 
the first term in Equation ©; we simply replace it by (77 — y(r])) 2 ■ Equation ([5]) 
thus becomes 

E [7/ - y( V )} 2 = (1 - a) 2 ( V - m ) 2 + 2as. 
This expression is minimal for 

(V - Vo) 2 - s , , 

a=— — . 7) 

iv - Vo) 2 

Besides the noise strength s, this expression contains only available quantities. 
Hence, given a rough estimate of s, this expression can be used as a guide to set 
a appropriately. Expression (O should be compared with the optimal a in Q; 
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within the same approximation that we did above, (rj — t]q) 2 = s + (£ — ryo) 2 , so 
that 

(V - Vof - s _ (C - Vo) 2 
(rj - r] ) 2 s + (C - m) 2 

That is, within our approximation, we obtain the optimal a (Equ. [¥]) for the 
true mean square error. If, alternatively, 770 is interpreted as another observation 
with mean square error sq, then 

a _ (v - m) 2 - s ^ (C - Vo) 2 ^ £0 

1 — a s s s 

Hence the a we have determined by minimising the out-of-sample error is an 
approximation to the optimal a for the BLUE estimator. 

The optimal a can be interpreted as an optimal weighting between the infor- 
mation provided by our first guess r/o and the noise polluted observation rj. The 
main aim of this work is to extend these concepts to data assimilation problems. 
Equation ([6]) in particular will play a central role in this contribution. Since 
data assimilation is dynamical in character though, the error functional should 
take this into account. Hence there should be terms in Equation @ that reflect 
our first guess that rj is the result of some dynamical process; in particular, 77 
and y are no longer just scalars, but pieces of trajectories. Our error functional 
will therefore be more complicated than in Equation ([2]). Furthermore, our 
estimators will be nonlinear, which necessitates further approximations. 

2.2 Data assimilation 

We suppose that a time series {rjt,t G [t s ,tf]}, referred to as the observations, 
has been recorded, where rjt & K d for all t G [t s ,tf]. We will often write rj 
(without time index) as an abbreviation for the entire time series {rjt, t S [t s , tf]} 
of observations, and similarly for other time series. The time interval will always 
be [t s ,tf], unless explicitely stated otherwise. 

Data assimilation is a procedure by which trajectories {x t € R D ,i G 
are computed which are orbits of some dynamical model, at least up to some 
degree of accuracy. The exact form of the model or the assimilation procedure 
is not important at the moment. Furthermore, the orbits should reproduce the 
observations in the following sense: There is a function C : M. D — >■ M. d (which 
can be considered part of the model) so that the output y t = C(x t ) is close 
to rjt up to some degree of accuracy. In order to keep things simple, we will 
continue to work with the mean square error as a measure of closeness. That is, 
we measure the deviation of the output yt from the observations rjt by means of 
the tracking error 

A T := J\ Vt -yt) T W{rjt-yt)&t, (8) 

where W is some positive definite matrix. 

Unless the model is perfect or at least allows to shadow the true dynamics for 
long times, data assimilation will have to balance between finding a trajectory 
which is close to the observations, and finding a trajectory which is close to 
being an orbit of the model. In this section, a criterion is proposed which 
allows to settle the trade-off. The first assumption we need in order to render 
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the criterion applicable is that the data assimilation under concern is able to 
explore the trade-off. More specifically, we assume that there is a parameter 
a £ [0, 1] so that when a = 0, the data assimilation produces trajectories which 
are orbits of the model while paying minimal attention to the observations. 
When a = 1, then the data assimilation is assumed to follow the observations 
as close as possible, while paying minimal attention to the model dynamics. 
This parameter will be referred to as the sensitivity parameter from now on, for 
reasons that will become clear later. Arguably, any data assimilation should 
possess such a knob, even though it is often hidden behind the fascia of the 
algorithm. The specific approaches to data assimilation considered later should 
illustrate this point. In Subsection 12.11 we encountered a trivial version of this 
trade off in which the first guess 770 played the role of the "model" for the 
observations. There, the knob for the trade-off was given by a. 

In order to distinguish points on the trade-off, we consider some form of out- 
of-sample error. For every t £ [t s ,tf], data assimilation provides an operator 
Y t which maps the entire time series of observations 77 onto y t = Y t {rf). In fact, 
the operators Y t depend also on the sensitivity parameter a, which will become 
important later. Until then, the dependence on a will not be made explicit, 
in order to avoid notational clutter. Note that the role of Y t played here is 
analogous to that of 2/(77) in Subsection 12.11 Roughly speaking, we want the 
operators Y t to separate 77 into a "desired" and an "undesired" part. To this 
end, we assume that r\ t can be written as f] t = Cf + r * for all t 6 [t s ,tf], where (t 
is the desired and an r t the undesired part. The result of the data assimilation 
can now be written as y t = Y t (( + r) for all t 6 [t s ,tf]. 

To define the out-of-sample error, we assume that r, henceforth called the 
observational noise, is some stochastic process with zero mean, and that r and 
C are uncorrelated. (This does not necessarily mean that r is a white noise 
process, nor that it is particularly irregular.) Now let 77^ = Q + r' t , where r' 
has exactly the same stochastic characteristics as r but is independent from the 
latter. We define the out-of-sample error as 



where yt = Yt(C + t) (note the absence of the dash on r). The expectation E in 
Equation ([9]) affects both r and r'. 

The out-of-sample error can be considered as measuring some kind of uni- 
versality property of our data assimilation algorithm. Since r' is uncorrelated 
to both C and y, we have (analogously to Equ.O 



where po is the variance of the noise (which we assume to be independent of 
t; see also Equ. [T5|) . Hence the second term in Equation (flU)) depends on the 




(9) 




(10) 



The cross terms cancel. The second term can be written as 
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noise alone and is not affected by the data assimilation. The first term of 
Equation (fTUj) . 

Aa ■= J ' E [(C t - y t ) T W(C t - yt)] At 

will be referred to as the assimilation error from now on. The reader should 
note the analogy between the assimilation error A\ and the mean-square-error 
of Subsection 12.11 Equation (TTJ) . 

2.3 Calculating the out— of— sample error 

Typically, data assimilation algorithms work by minimising some error func- 
tional which includes some form of tracking error At (Equ. [5]) plus some hard 
or soft constraints which take into account the dynamical character of the prob- 
lem. Invoking some kind of stationarity though, we can reasonably hope that 
the variations of At are in fact small, that is, 

EA T = A T = J 1 (rjt - y t ) T W{m - y t )dt. (11) 

Analogously to Equation (O in Section |2~T1 we should not expect the tracking 
error to be even approximately equal to the out-of-sample error, since the 
integrand in At involves the difference rjt — yt — Ct + rt — Y t (C + r), while the 
out-of-sample error uses rj' t — yt — Ct + f' t — It (C + r ) i note the distribution of 
dashes. More specifically, we have 

E [(r,' t - y t ) T W{rl t - y t )] = E [( Vt - y t ) T W{ m - yt)} 

+ 2tr(WCov[y t ,r lt }), 

with y t — E [y t ]. The proof is the same as for ©: expand the quadratic terms on 
both sides and note that y t is correlated with rj but not with rj' . By integrating 
over time, we get 

ft, 

E oas = EA T + 2 J tr (W Cov [y t , %]) dt. 
We will write this as 

E OOB = EA T + 2 tr(WSp ), (12) 

with 

po := Er t rf (13) 
being the variance of the noise, and 

S--=J t f Covbte.TjdPo'd*. (14) 

which we again refer to as the sensitivity. Note that the sensitivity is essentially 
the integral of a correlation and thus has dimension time. 

What renders Equation (fT2)l interesting is that it, at least in principle, opens 
up the possibility of approximating E oos operationally. To this end, the approx- 
imation pip would be used for the tracking error, while for the second term, 
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the statistics of r would be required as well as the sensitivity. In practice, an 
exact calculation of S might be a serious difficulty given the complexity of data 
assimilation algorithms. In order to get any further, we assume that the effect of 
r on the reconstructed observations y can be described through a linear analysis. 
More specifically, 

yt = Y t (C + r)°iY t (Q+dY t (ri)*r (15) 

where dY t (ri) is a linear operator, describing the first order response of the 
operator Y t at 77. Taking the linearisation at r\ rather than at C is maybe less 
customary, but the error commited by either choice is of the same order. By 
dY t (r)) * r, we denote the application of that linear operator to r. From this 
assumption, we get that to first order 

yt = Y t (t). (16) 

Since 

Cov [y t , m ] = E [(y t - y t ) T ( m - fjt)] = E [(yt - ytfn) 
we obtain for the sensitivity by substituting with (fT5l [T5| in (fH)) 



S^j E[(dY t ( V )*r)rf]p^dt. (17) 

This linear approximation will be at the basis of subsequent calculations of the 
sensitivity. 

As was mentioned in the introduction, a related sensitivity conc e pt ha s 



been studied in the li t erature already, for example in ICardinali et al.l (|2004) 



Liu and Kalnavl (|2008h : lLiu et al.l ((2009). As far as these studies pertain to 3D 



Var, the current output y t at time t is assumed to be linearly regressed from 
the current observation rjt and the background x t which contains all previous 
information. Only the influence of the current observation r\ t onto yt is taken 
into account. The effect of previous observations on y t (via the background) is 
not studied. In contrast, the sensitivity as defined in the present paper consid- 
ers the influence of the entire history of observations (past and future) onto the 
output yt- 

Our original motivation for studying the out-of-sample error and the sen- 
sitivity was to get a handle on the trade-off of data assimilation. The precise 
relation between the sensitivity and what was called the sensitivity parameter 
before obviously depends on the details of the data assimilation scheme. It is 
to be expected though that, in any event, the sensitivity is intimately related 
to the trade-off: to create solutions that track the observations closely, a large 
sensitivity is needed, which however introduces dynamical errors. Computing 
trajectories which adhere to the model dynamics on the other hand is only 
possible with a low sensitivity. 

In order to compute S in practice, firstly the linearisations of the operators Y t 
about T) have to be calculated. These are often already computed as part of the 
data assimilation procedure itself, in which case this typically rather burdensome 
task need not be performed twice. Again, the details depend on the specific 
data assimilation technique. Two general approaches to data assimilation will 
be considered in the next two sections, and ways to compute the sensitivity will 
be discussed. 
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The correlation structure of r has to be known at least to some degree. In 
fact, there is nothing really to be known here, but rather a decision has to be 
made which parts of the observations are to be considered of interest and which 
parts are to be considered noise. In the next subsection, an approach to this 
problem is discussed which hopefully covers a large range of applications. 



2.4 Assumptions on the noise r 

The following assumptions on r, which typically apply in practical situations, 
allow to simplify the calculation of the sensitivity in the cases considered in this 
paper. 

1. r is sampled from a signal v with sampling interval At and subsequently 
interpolated. 

2. v is stationary in the wide sense. 

3. is small with respect to the bandwidth of v. 

It follows from assumption [2] that the covariance function ¥,v t v^ depends only 
on t — s. Furthermore, v has a well defined power spectrum. Now let 

Pts := E [r t rj] . 

It follows from assumption [T] that also pt jS depends on t — s only, and that r 
has a well defined power spectrum. This power spectrum is confined to a band 
of width 257- Furthermore, during sampling, all power of v beyond the critical 
frequency will be aliased into that band. Hence and from O it follows that 
r has still power near the critical frequency In summary, we can conclude 
that the power spectral density of r can, in good approximation, be written as 

I otherwise. 

In effect, we assume r t to be band limited white noise. For a discussion how 
band limited white noise arises through sampling white noise, see lAstrcVml ( 2006h . 



If there is good reason to believe that assumption [3] does not hold, and 257 is 
on the same order of magnitude as the bandwidth of v or larger, then Equa- 
tion (|18p might still be a reasonable approximation, provided At is replaced by 
the bandwidth of v. 

In this paper, we will in fact not employ the specific form of the correlation 
function (|18[) . The only thing we require is the validity of the following two 
approximations (see I Astroml . 20061 Ch. 2, Sec. 5). Suppose that is a function 



on some interval [£1,^2] which varies slowly compared to At. If t is well inside 
the interval [ti,^], then 

(f) T p T ~tdT = p At(f> t (19) 



If however t — ti , then due to the symmetry of p, 



^rPr-tdr = ipoA^t- (20) 
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3 Example 1: Variational data assimilation 



In this section, a class of data assimilation approaches, often referred to as 
variational data assimilation, will be considered. After recalling the basics of 
this approach, we will indentify the sensitivity parameter a. In order to set a 
with the help of the out-of-sample error, we need to calculate the sensitivity. 
This will essentially occupy the present section. 
We assume a model of the form 

it = f{x t ) + u t , y t = Cx t (21) 

with state Xt € M. D and output yt E M. d . The model dynamics f is a vector field 
on some open subset of R D , C is a d x D-matrix. The model dynamics should 
be thought of as capturing our a priori knowledge of the physical mechanisms 
underlying the observations rj. For this reason, the u t € M. D will be referred to 
as the dynamical perturbations. Variational Data Assimilation attempts to find 
state and dynamical perturbation trajectories {(xt,Ut),t G [t s ,i/]} satisfying 
the relations (|21[) so that the action integral 



A a {x, u} := - I (r) t - Vt) R{m ~ Vt)dt 



I- a 



ujQutdt (22) 



is minimal. Here, R and Q are positive definite matrices which might be nec- 
essary to appropriately scale different degrees of free dom. This approach has 
been s t udied in various publications, se e for example iLe Dimet and Talaerandl 
(|l986f k lCourtier and Talaerandl (|l987t l: iDerberl (jl989h : lApte et all (|2008h 



We shall now digress for two paragraphs and discuss the various arguments 
which have been put forward as to why an action integral of the form (|23|) 
should be used. In the case of linear dynamics, the variational approach (to 
be described below) in conjunction with the action integral (|2"2"j) leads, some- 
what coincidentially, to the Kalman filter and smoother. This fact, which is a 
consequence of Kalman's duality theorem, can be seen as an extension of th e 
BLUE concept mentioned in Subsecion 12.11 (see Jazwinskv . 1970t Sagel 19681) . 



These facts do no longer hold if the dynamics are nonlinear. Yet an alternative 
interpretation o f the action integ ral is as the logarithmic aposteriori density 



of a trajectory (jJazwinskvl . 119701) . given the observations. In connection with 



this interpretation though, it is necessary to keep the following reservations in 
mind. Firstly, the perturbations to the dynamics and the observations need to 
be gaussian and uncorrelated in order for this interpretation to apply. Secondly, 
R and Q have to be the inverse covariance matrices of the observational and the 
dynamical perturbations, respectively (and a = 1/2); otherwise, the functional 
we minimise is not the logarithmic aposteriori. Thirdly, in a nonlinear context, 
the maximum-aposteriori estimator does not necessarily yield a minimum mean 
square error. Finally, the ver y concept of a "density" req uires substantial mod- 



square error, finally, trie ver y concept or a density req uires substantial mod- 
ifications in continuous time; Zeitouni and Dembol ()l987l ) discuss an aposteriori 



which, in general, looks different from Equation (I22[) . (The numerical examples 
we will discuss later happen to be an exception; see the discussion in Sec. [5j) 

Without invoking statistical concepts, we contend that an error functional 
of the form (12"2")) might still be justified for the purpose of regularisation. For 
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a genera l discussion an d motivation of this view (adopted here), we refer the 
reader to lBrockerl ( 2010f ). The parameter a, which is analogous to the parameter 



a in Subsection 12. 11 is interpreted a regularisation parameter. (It is fair to say 
that under this paradigm, there is no a priori reason to use a quadratic measure 
of error, which is used here mainly for mathematical convenience.) 

Whichever view on the interpretation of the functional A a is adopted, there 
remains the problem of setting a. The parameter a controls the weighting be- 
tween the two contributions to the action integral, namely between the tracking 
error 

rt 



A T = ^J^( Vt -y t ) T R(r lt - yt )dt 



and the model error 



1 f tf 

Am = ^ J ufQutdt. 

Note that the tracking error was already defined as a diagnostic in Equation ((5J , 
albeit with a weighting matrix called W. That tracking error might be referred 
to as the diagnostic tracking error. There is no need though for using the 
same weighting matrix in the diagnostic tracking error and in the functional A, 
whence we will keep them distinct. 

For the variational approach, a is a sensitivity parameter in accordance with 
the definition given in Subsection 12.21 If a is close to one, there is almost no 
penalty on the dynamical perturbations. The problem of minimising the action 
integral then becomes a very easy one, as arbitrary dynamical perturbations 
can be used to make the tracking error small. The solution then follows the 
observations very closely, but in general it will not be a good solution of the 
model dynamics, as u t will be large. If a is close to zero, there is almost no 
penalty on the observations. (The problem of minimising the action integral 
in this situation is generally not very easy, since the problem tends to have a 
very poor condition; none theless, accept able solutions for small a can be found 
through continuation, see iBrocken, 120101 ) . The solution will be a good solution 
of the model dynamics in the sense that Ut is small in this situation, but in 
general there will be deviations from the observations. In summary, varying a 
allows for exploring the trade-off between finding a trajectory which is close to 
the observations, and finding a trajectory which is close to being an orbit of 
the model. In principle, the matrices i?, S can be considered an entire set of 
sensitivity parameters which could be determined by the approach proposed in 
this paper. In the examples studied in this paper though, we will work with fixed 
matrices R, S and study the dependence on the scalar sensitivity parameter a 
only. 

To compute the out-of-sample error by means of the sensitivity, some re- 
marks are necessary as to how to solve the variationa l data assim ilation problem. 
The following results are classical, see for example [Sage variational 



problem under constraints can be transformed into a variational problem with- 
out constraints by introducing dual variables or Lagrange multipliers. More 
specifically, the problem above is equivalent to finding a stationary point (with- 
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out constraints) of the augmented action 



A a {x t ,u u X t ,rj t } := / H a {x u u u \ u rj t )dt 



f 

Xfx t dt (23) 

over {(xt, u t , Xt),t G [t s , tf]} for a fixed time series 77, where the Hamiltonian is 
given by 

H a (x,u,\ 7 ri) = ^(ri - Cx) T R(i] - Cx) 



+ ^-u 1 Qu + X 1 (f(x)+u). 

Since no derivatives of u t appear, the minimisation over u t can be immediately 
effected, leading to the criterion 

d u H a = for all t, 

which gives 

u t = Q _1 A t . 

1 — a 

Substituting with this expression for it, we obtain 



H Q (x,X, V ) = ~{ri - Cx) T R{ V - Cx) 



2(1 



-X T Q- 1 X + X T f(x). (24) 



as the definitive expression for the Hamiltonian. The following equations are 
necessary and sufficient conditions for {(x, A)} to be a stationary trajectory of 
the augmented action (j23|) : 

d x H a -x = 0, (25) 
d x H a + X = 0, (26) 
At s = 0, A t/ = 0. (27) 

The Equations (|25I26I) describe the dynamical evolution of the states x and the 
co-states A. The conditions in Equation (|2"7T) represent boundary conditions. 
Depending on the specific circumstances, other boundary conditions might be 
appropriate. For example, imposing (hard or soft) initial or terminal conditions 
on xt in the problem formulation ( "background error" ) would lead to modi- 
fications in the boundary conditions (|2"?| . Such modifications though do not 
change the general character of the problem. To keep this discussion as short 
as possible, we will work with the simple boundary conditions (1271) . 

For the specific Hamiltonian (j24|) the necessary conditions (|25TI27|) read as 



x t - f(x t ) - -^Q^Xt, (28) 
1 — a 

X t = -Df(xt) T X t + aC T R{n t -Cx t ), (29) 

A t , = 0, A t , = 0. (30) 
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The same formalism can be applied to much more general setups, such as more 
general integrants in the action as well as more general dynamical systems. 
With the appropriate Hamiltonian H, the necessary conditions for an optimum 
always look like Equations (|25I26[) . 

Instead of an initial value problem where the state (xt,\t) is specified at 
t = t s , we face a two-point boundary value problem or BVP for shorlQ, in 
which the state is specified partly at the initial time and partly at the terminal 
time. BVP's require other solution algorithms than initial value problems, and 
a large variety of numerical techniques have been developed. Our aim though 
is to solve not only the Hamiltonian BVP but also to calculate the sensitivity. 
Therefore, in order to save computational resources, quantities that have already 
been calculated during the BVP solving should be reused as much as possible. 
Considering every conceivable BVP solver and demonstrating how it could be 
extended to yield also the sensitivity is obviously beyond the scope of this paper. 
Discussing two general approaches to BVP solving will have to be sufficient. 
We hope that these two examples are general enough to be transferable to more 
specialised BVP solving approaches. 

A very general method to solving BVP's is the collocation method, consid- 
ered in Subsection l3.ll In this approach, the BVP is approximated on a discrete 
time mesh by a set of nonlinear equations, the so-called collocation equations. 
The sensitivity is related to the Jacobian of the collocation equations at the opti- 
mum. If the Jacobian calculated during the solution of the collocation equations 
can be recycled, then the sensitivity is obtained with close to no additional com- 
putational effort. Subsection 13. II also contains further references on this topic. 

In the collocation approach, the entire set of collocation equations is solved 
simultaneously. This renders the collocation approach numerically expensive 
for large (e.g. spatially extended) systems, even if the fact is exploited that the 
linear approximation is typically a sparse matrix. In these situations, sequential 
approaches are presumably more economical. Subsection 13.21 considers such an 
approach. It is based on the fact that the linear response of the Hamiltonian 
BVP (with respect to perturbations of the inputs) can be described by a linear 
Hamiltonian BVP. The solution of the latter BVP (and subsequently, the sensi- 
tivity) is obtained by consecutively solving matrix valued differential equations 
of Rlcatti type. 



3.1 The sensitivity through the Jacobian of the collocation 
equations 

We start with explainin g the general idea of the colloca tion method. For more 
details, see for example Kierzenka and Shampinel ( 2001 ). which contains a dis- 



cussion of the bvp4c-algorithm implemented in Matlab that uses the colloca- 
tion method. For the moment, we are not using the fact that the vector field 
is Hamiltonian, so we will be assuming (until further notice) that the BVP is 
given as 

zt = F{t,z t ), te[t s ,t f ], b(z tB ,z tf ) = 0, (31) 

with z t — (x t ,\t) £ M. 2D , F a vector field on M. 2D and &(., ..) a function repre- 
senting the boundary conditions. The general strategy of most BVP solvers is 



1 All boundary value problems in this paper are two-point, so using the general abbreviation 
BVP should not give rise to confusion 
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to approximate z t at a series t s = to < . . . < t^ = tf of temporal mesh points; 
we will write z :— (zt , ■ ■ . , Zt N ) for the values of the solution at these points. 
Usually, z is obtained by solving a set of equations 

$,(2)=0, i = 0...N, 

called collocation equations. The collocation equations are effectively discrete 
time approximations of the original differential equation as well as the boundary 

conditions in (|3"Tj) . More specifically, the $i are functions of Fi = F(tj, Zt,) and 

the bo undary conditions b(zt , z t N ) . For the BVP solver bvp4c bv lKierzenka and Shampmel 
(2001) implemented in Matlab, the explicit form of $ is given in the Appendix. 



Typically, the collocation equations are solved using a quasi-Newton type algo- 
rithm, which involves a numerical approximation to the Jacobian of the collo- 
cation equations. 

Once available, the Jacobian can be re-used to compute the sensitivity, as 
will be explained now. As was explained in Subsection 13.21 a perturbation r 
added to the observations will result in a perturbation of the Hamiltonian vector 
field (which we have written as F in this subsection), which in turn will entail 
perturbations of the solution (which we have written as z in this subsection). In 
terms of the collocation equations, it means that we have a slightly perturbed 
solution z + 8z of some slightly perturbed collocation equation 

(3> + 5<I>)(z + Sz) = 0. 

Expanding the left hand side and keeping only terms up to linear order in the 
small quantities i5$ and 5z, we obtain 

= <P(z) + D>$>{z)5z + <5$(z). 

The first term vanishes since z is a solution of the unperturbed collocation 
equations, per assumption. Solving for 8z gives 

Sz = (32) 

Since the perturbation S$(z_) of the collocation equations is due to perturbations 
SF of the vector field, we have to first order 

s *i& = T,w® 5Fi ' (33) 

It remains to express 6Fj in terms of r. To this end, we have to resort to the 
particular form of F given in Equations (l2"5t and (|2T))) . It follows that 

SF, = ( ^ ) . (34) 

Therefore, by combining Equations (|32D , (1331) , and (|3~4")l , along with the fact that 
dY ti — (C, 0) Szi, we get that 

= -E( C '°) -^Hffe) • ( C T Q R ) ■ (35) 

j,k 
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For the correlation E [(dY^ * r ) r u\ we get from Equation (f3"5j) 
E [(dY ti * r)rl] 



_j,k 
3>K 




Using the trapezoidal rule over the time mesh of the BVP solver, we obtain the 
following approximation to the sensitivity: 



with hi = ti + \ — ti. Computing the sensitivity through this relation becomes 
attractive if the inverse D^^ 1 of the collocation Jacobian is already available 
from the BVP solver, as this is the most costly step in evaluating Equation f|36|) . 
Secondly, the partial derivatives d$i /dFj can often be extracted from the BVP 
solver as well. The explicit form of the partial derivatives 9$^ / dFj for the BVP 
solver bvp4c is given in the Appendix. 

Equation (|36[) is valid without imposing any specific conditions on the co- 
variance pi. s of the signal r. Under the additional assumptions as outlined in 
Subsection 12.41 many of the off-diagonal terms in the sum over i, k of Equa- 
tion (|36[) will be zero, which might considerably reduce the necessary amount 
of calculations. 

3.2 The sensitivity through solution of a linear BVP 

The general strategy of solving nonlinear problems by solving a series of linear 
problems which approximate the original problem is, in principle, also applicable 
to BVP's. If such a strategy is employed, then Equations of the form (|37I38|) 
below get solved. This means that all essential calculations of the sensitivity 
calculations below have already been carried out and can be recycled. Suppose 
that {x, A} is a solution of the BVP (|25H27I) with r) — (. A perturbation r t added 
to the observations will result in a perturbation of the Hamiltonian vector field, 
which in turn will entail perturbations (£t,Zt) of the original solution (xt,X t ). 
To first order, £ t and It are given by linearisation of the Equations (|25H27p about 
{x, At}, which read as 



S = -J2h-i(C,0)-D$r j 1 —i-(z) 



(36) 




6 
It 



-N t i t -Fjl t -D t r u 



(37) 
(38) 



with the identification 



F t =d xX H(x t ,\t,(t) 
M t = d X 2H(x u \ t ,(t) 
N t = d x 2H(x t ,\ t ,(t) 
D t = d nx H(x tl XtXt) 
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We assume that d v \H — 0, as is the case for the specific Hamiltonian (|24p . The 



boundary conditions for Equations Q37H38P are l t 



0. Having solved the 



Equations (|37ll38|i , the linear response is given by dY t (£) * r = C^ t > whence the 
sensitivity can be written as 



S 



[C&rf] dtp, 1 =C E [6r t T ] d^o 1 



(39) 



We therefore compute the correlation r t := E [^rJT] . It is possible to decouple 
Equations (I37I38P by using invariant imbedding (see e.g. ISaed . 119681 ). In the 
present case, this means to try the approach l t — Pt£t + Mt- Substituting with 
this approach for l t in (|37H38|) and equating like powers in £ t , we obtain 



it = B£ t + M tlH 
At = ~B?fi t - D t r t 
B t := F t + M t P t 

-P t = P t F t + FjP t + P t M t P t + N t , 



(40) 
(41) 
(42) 
(43) 



with conditions p tf — 0, Pt f = 0, £ ts = — P t ~V« s - To solve this system, Equa- 
tion (|4*3")l is integrated first backward in time, simultaneously with Equation (|4"Tjl . 
This is possible since these Equations do not depend on £ and the end conditions 
are given. Then the Equation (|4"0|) is solved forward in time. The solution l t 
of (f3"5)) can be recovered through l t — P t £ t + fi t . The solutions £t and fi t can be 
written as 
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M s p s ds 



(44) 
(45) 



with <fi t a fundamental system of 



Hence, multiplying Equation (|44p with rf from the right and taking E | 



P^E [nt s rj] + J* cfi-Hm [fx s rf] dt 



gives 
(46) 



For s < t, multiplying Equation (|45p with rf from the right and taking E [. . .] 
we get 



E U 8 r t r ] 



6^£> T E [r T rJ] dr 



b T T D T p T ^ t dT. 



As to the correlation structure of rt, we again impose the conditions of Sub- 
section [2111 In particular, we assume that <fi^D T varies slowly compared to At. 
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Thus, Equation (ITO!) can be applied, and we get 



E [ii s rf] 



Atcj) 



Replacing with Equation (|4"T1) in Equation (|4l))) we obtain 



(47) 



r* = At, 



%D t po. 



In fact, this can be written as T t = At J t D t po with J t obeying the matrix valued 
differential equation 



J t = B t J t + JtBi +M t 



(48) 



with initial condition Jf a 
tion (1591) as 



P t . The sensitivity is obtained from Equa- 



A t C 



J t D t dt 



(49) 



To summarise, in order to obtain T t we first need to linearise the Hamiltonian 
equations, then use this data to form the Ricatti equation (|43l) . which has to 
be solved backward in time. Next, the solution P t of the Ricatti equation is 
used to form B t , defined in Equation (|4"!2]) . With this data, Equation (|48D for 
J is integrated forward in time, and the sensitivity is eventually obtained from 
Equation (|49|) . 



4 Example II: Data assimilation through syn- 
chronisation 

Synchronisation between dynamical syste ms is a phenomenon which has been 

under s tudy for some time, see for example Parlitz and Kocarev ( 19991) ; Boccaletti et al 
(|2002l ): |Pikovskv et all (|2003l ). As in Section H Equation flST), let 



xt = /(act) + 1H, Vt = Cx t 



(50) 



a model with dynamical perturbations and output. A coupling between the 
model (|50|) and reality is established by setting 

u t =K-(r) t - y t ), 

that is, the error between the model output and the observations is fed back 
into the model, with K being some suitably chosen coupling matrix. Synchroni- 
sation refers to a situation in which, due to coupling, the error r\t — yt becomes 
small asymptotically, irrespective of the initial conditions for the model. We 
might then hope that x t is in some sense similar to the 'true state of reality'. 
This hope is supported by mathematical results stating that if the reality is 
indeed a dynamical system not too unsimi lar to our model (|18[) . synchronisa- 
tion occurs (see e.g. fPikovskv et al. . 2003 ). Many examples for spontaneous 
synchronisation have been found in nature and engineering. I n the con t ext o f 
data assimilation, an approach known as Newtonian nudging (jSeaman . 1990) 
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can be understood as trying to establish synchronisation between the model and 
some presupposed dynamics generating the observations. More specifically, the 
assimilation is accomplished by integrating the following system 

x t = f(x t )+K( m -Cx t ). (51) 

The question arises how to choose an appropriate coupling. For the experiments 
carried out in this paper, we set K = k-C t , with coupling parameter k. Further- 
more, C was chosen so that CC T = I (the <i-dimensional identity matrix). The 
coupling parameter plays the same role as a in variational data assimilation, 
namely controlling the trade-off between tracking error and dynamical error. 
Therefore, the coupling parameter is the sensitivity parameter in the present 
situation (we could set a — to obtain a sensitivity parameter between zero 
and one). Indeed, from Equations (|50I51[) we obtain 

U t = KC T (l] t -Cx t ), 

which gives 




in the case of R and S being the identity matrix. 

As in variational data assimilation, a possible criterion for choosing k could 
be a minimal out-of-sample error. To this end, we need to calculate the sensi- 
tivity of the synchronisation approach. The response of xt to small changes Tt 
in the observations is, to first order, described by 

i t = (Df(x t )-KC)^ t +Krt (52) 

Let 4>t be the fundamental system of the linear part, that is, 

j H = {Df{x t )-KC)<t> t . 

The solution of (l52|) with £t s = can be written as 

from which we get 

r t = E [&rf ] =<ptj ^Kps-t ds £ ^Kp At (53) 

Again, we have assumed the correlation structure p T _t for r as discussed in 
Subsection 12.41 and also that (fij 1 is slowly varying compared to At, whence 
Equation (j20|) applies. Eventually, the sensitivity is obtained as 

Ttdtp^ 1 ^— (t f -t s )CK 
= ^(t f -t s ) K CC T 
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5 Numerical experiments 



Several numerical experiments were performed, with the aim of testing the 
methodology, in particular the equivalence of the approaches presented in Sec- 
tion |31 and the validity of the linear approximation (j 171) . Detailed experiments 
were carried out using the Lorenz'96 system, which will be discussed in more de- 
tail here. The following experimental setup was use d: The "reality" is given by 
the Lorenz'96 system with two scales ( Lorenj . 1996 ). described by the following 



equations: 

Xt = -AV 1(^-2 - X l+1 ) -X t +F- jZ h (54) 

M 

Zi = S Zjj , (55) 

3=1 

Zi,j = -aiZi,j+i(zi,j+2 - - d2Zij + Xi, (56) 

and the corresponding observations 

Vk = ^c kt (Xi +n), 

i 

where a\ — 100, 0,2 — 10, Xi and Zij denote the slow and the fast degrees 
of freedom respectively, F = 18, 7 is a constant quantifying the influence of 
the fast degrees of freedom onto the slow ones, and r is short time correlated 
Gaussian noise with E[ri(t)rj(t)] = poSij. Two cases of C = {%■} are considered. 
In the first case, observations are available from all degrees of freedom, that is, 
Cij = Si j. In the second case, only every second degree of freedom was observed, 
that is, — 821,3 ■ The index of the slow degrees of freedom as well as the first 
index of the fast ones takes the values % = 1, . . . , L, periodic boundary conditions 
being assumed. The second index of the fast degrees of freedom takes the values 
j = 1,...,M, where z i<M +i = Zi+1,1, z i>M +2 = Zi+1,2, and z lfi = Zi-\,M, for 
i = 1 . . . L. In all the examples considered here, L = 64 and M = 8. In any 
event, only slow degrees of freedom are observed. 

Simulations of the reality have been generated by integrating Equations (|54I - 
|56| by means of a simple Euler scheme with integration step St = 10~ 5 . Ob- 
servations rji(t) were sampled with At = 5.12 • 10~ 3 only. The corresponding 
sampling frequency is still sufficiently large, compared to the bandwidth of the 
signal. 

As the assimilating model we use the Lorenz'96 system with one scale only 
(i.e. no fast degrees of freedom) 

±i = -Xi^x{xi-2 — Xi+i) - Xi + F, (57) 

Vi = ' S >^ J CkiXi- (58) 

i 

This setup is motivated by practical situations in which typically high frequency 
modes living on small scales cannot be taken into account, due to limited com- 
putational power and impossibility of acquiring data at such small scales. Note 
that in the example presented here, we have chosen the same forcing term F for 
both model and reality, whence any model error is due to the absence of the fast 
degrees of freedom in the model. Thus, 7 in Equation (1541) controls the amount 
of model error. 
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In the following, we consider four different scenarios. The first two are given 
by a small noise case with po = 0.01 (corresponding to a SNR of 57dB) and 
7 = 0.01 as well as a large noise case with p = 1 (corresponding to a SNR 
of 16dB) and 7 = 1. Observations were taken of all slow degrees of freedom, 
corresponding to cy = S^j, as discussed above. For the other two scenarios, the 
same combinations of the noise values are considered, but observations are only 
taken of every second slow degree of freedom, that is, c,j = 621 j ■ 



5.1 Variational data assimilation 

The functional to be minimised (|2"31 takes the form 



Jo k i i 



2 



+ 2J Xi(xi + Xi^i{xi- 2 - x i+1 ) +Xi — F- Ui), (59) 

i 

where the total integration time was T = 2 20 6t, the Ui denote the dynamical 
perturbation, and Xi are Lagrange multipliers. In the present numerical exam- 
ple, the matrices R and Q were taken as unit matrices. This is justified since 
the variability of the different dynamical degrees of freedom are expected to be 
similar, and likewise for the output. 

After eliminating u with the help of Mi = — A$/(l — a), the Hamiltonian 
equations resulting from the functional A are given by 

±i = -Xi-x{xi-2 - Xi + x) -Xi+F- ^ , (60) 

1 — a 

A, = Aj+i(a;j_i - x l+2 ) + K+2X1+1 - K-iXi-2 + A, 

+ a}^Cki(r)k -y^kjXj). (61) 
k j 

These equations have been solved by means of a NAG boundary value problem 
solver (D02RAF, see for example Perevral 19791) with boundary values A(0) = 
A(T) = 0. The resolution dt mo del of the BVP solver mesh is in general not 
identical to the sampling interval At of the observations. For intermediate 
times at which no observations exist, rji(t) was interpolated by means of cubic 
splines. 

In order to determine the sensitivity, we proceed according to Section 13.21 
that is, first P is obtained by means of the matrix valued equation (j43]), which 
is integrated backward in time, and subsequently Equation (|48[) is integrated 
forwards in time to get J. In the present situation, the matrices F, M , and N 
are given by 

Fij = -S hJ + S hJ+ i(x i+ i - x l - 2 ) 

+ - Sij +2 )xi-i, (62) 

Mij = (63) 

= CkiCk 3 ~~ ^'ij+iAi+i + ^jj+2Ai_i 

— Sij-iK+2 + <5i,i-2Aj+i, (64) 
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Figure 1: For variational assimilation, i? oos (♦), At (a), sensitivity (•), and Aa 
(■) are plotted vs. In [a/(l — a)], for the small noise complete information (a), 
large noise complete information (b), small noise incomplete information (c), 
and large noise incomplete information (d) cases. The solid vertical line marks 
the value of the coupling constant at which the assimilation error gets minimal, 
a op t • The position of the minimum of £" os coincide perfectly with a opt in all 
cases but (d), where the minimum of E oos is marked by the dashed vertical line. 
Note that, even in the latter case, the deviation of the two minima is small. 
E oos and At have been shifted on the ordinate for better visibility. 



respectively. The sensitivity, S, is eventually obtained according to Equa- 
tion or precisely 

S = -aAt f CJ t C T dt, (65) 

where At is the sampling interval of the observations. Note again that At, in 
general, need not coincide with either the resolution <5t mo dci = 1-28 • 10~ 3 at 
which Equations (|43p and |48|) were integrated, nor with the resolution of the 
the BVP solver's time mesh. 

Figure Q] displays E OOB (diamonds) , computed according to Equation (|12l) , 
approximating the average (diagnostic) tracking error as in (jlip . For this study, 
we set W = R =unity matrix. Also shown is the sensitivity (more specifically, 
2 tr(WSpo), bullets), the tracking error At (triangles) and the assimilation error 
Aa (squares) for the four studied cases (small noise and complete information 
on panel (a); large noise and complete information on panel (b), small noise and 
incomplete information on panel (c) , large noise and incomplete information on 
panel (d)). All quantities are shown versus the sensitivity parameter a. As 
a reminder, the assimilation error A a is given by the first term on the right 
hand side in Equation (JTUJ) . In Figure [U At and E OOB have been shifted on the 
ordinate for better comparison. Furthermore, all quantities shown in this section 
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have been divided by the time span T, and the number of observed degrees of 
freedom d. In other words, all quantities should be interpreted per unit time, 
and per observed degree of freedom. The tracking error as well as assimilation 
error have been integrated using the trapezoidal rule with resolution given by 
At. As expected, At decreases monotonously with a, while the opposite is 
the case for the sensitivity. This alone does not imply a minimum of E oos , for 
which a cancellation of the derivatives of At and 2tr(WSpo) with respect to a 
is necessary. Anyway, E oos displays a well defined minimum at a value of the 
sensitivity parameter a which coincides almost perfectly with the value for a 
with minimum assimilation error. In Figure [TJ the positions of the minima of 
the assimilation error and the linearised out-of-sample error are marked by the 
solid and dashed vertical lines, respectively. In almost all cases, the minima 
of E OOB and the assimilation error coincide perfectly within the resolution by 
which the sensitivity parameter a has been sampled. Only for the case of large 
noise and observations of every second degree of freedom only, a discrepancy is 
observed, which however is still very small. 

The dependence of the optimal a (in terms of a minimal tracking error) on 
the experimental setup, in particular on the amplitudes of both the dynam ical 
and observational perturbations, was already investigated in lBrocker (|20ld) . In 
that study, it emerged that the minimum of the assimilation error moves towards 
smaller values of a with increasing observational noise, and towards larger val- 
ues with increasing dynamical perturbations. This effect is encountered in the 
present experiments as well. Bearing in mind that the sensitivity increases with 
increasing a, this means qualitatively that the larger the observational noise, 
the smaller the sensitivity should be; larger dynamical perturbations though 
require a larger sensitivity. 

According to Equation (fTQ| . the difference between the assimilation error 
Aa and the out-of-sample error E oos should be independent of a, and be equal 
to (tf — t s )tr(Wpo). In our numerical results, this is not quite the case for 
very large a (i.e. much larger than the optimal value). A possible explana- 
tion is that the sensitivity was estimated using the linear approximation (|17p . 
which essentially assumes a small response of solutions to changes in the input. 
Clearly, this is not quite true for large a. Away from extremely large values of 
a though, we found the difference between the assimilation error Aa and the 
out-of-sample error E OOB not only to be independent of a (as can be discerned 
already from Fig. [IJ , but also quantitatively in accordance with what our theory 
predicts. We can conclude that, despite several approximations, our approach 
yields quantitatively correct estimates of the out-of-sample error for relevant 
ranges of the sensitivity parameter a. 

Another question which might arise naturally is to what degree the dynami- 
cal perturbations u carry information about the unmodelled degrees of freedom. 
We might expect, provided "all went well", that the perturbations are similar to 
the unmodelled degrees of freedom at least in some statistical sense. Learning 
something about the true underlying dynamics is a possible application of the 
concepts proposed in this paper, which will be subject to further investigation. 
A few very preliminary results shall be presented here. In Figure [5J several 
distributions (in the form of probability density functions) of the dynamical 
perturbation Ui are shown. Panel (a) and (b) correspond the small noise case 
and the large noise case, respectively; both cases used complete information on 
the slow degrees of freedom. Both panels shows distributions for u correspond- 
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b) 




Oil Oil «3 

panel (a) 2.55 4.1 5.66 
panel (b) 2.86 4.41 5.97 

Figure 2: The distributions of the actual fast degrees of freedom (more specifi- 
cally — 7^ and and the dynamical perturbations Ui are shown for three values 
of a. The variable h on the ordinate stands for cither —jZi and Uj. Panel (a) 
shows the small noise case, Panel (b) the large noise case. Both cases used com- 
plete information about the slow degrees of freedom. In both panels the solid 
line shows the distribution of Zi, while the dotted, dashed, and dash-doted lines 
show the distribution of Ui for three different (increasing) values of a. In both 
panels, the dashed line corresponds to a with a minimal out-of-sample error. 
The table gives the actual values of log( y^j) corresponding to the displayed 
cases. 
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ing to three different values of a (dashed, dotted, and dash-dotted lines, in 
increasing order of a). In both panels, the dashed line corresponds to that a 
which gave a minimal out-of-sample error. Further to that, the distribution of 
the sum over the fast, unmodeled degrees of freedom is shown, more specifically, 
the distribution of —7^ (solid line). Pending a more detailed analysis, visual 
inspection already shows, reassuringly, that the distribution of the unmodeled 
degrees of freedom agrees best with the distribution of u for the optimal value 
for a, that is, the a value giving a minimal out-of-sample error. For a values 
smaller than the optimal one, the distribution is too wide, while a too large 
value of a gives a distribution of perturbations u which appears to be narrower 
than that of the fast degrees of freedom. The actual values for a (or rather, for 
log(-j3— ) for comparison with Fig. [1} can be found in the table complementing 
Figure [5J 

It is worth stressing that the the estimated distributions (or other statistical 
properties) of the perturbations will depend not only on the actual model error 
but also on the specific assimilation scheme. In the present case for example, 
the fast degrees of freedom seem to have a nonzero mean value, while the distri- 
butions of u are centered closer to zero. This is presumably due to the specific 
form of the functional A (Equ. ([52])), which clearly favours a perturbation u 
with a small mean value. In other words, as estimators of the fast degrees of 
freedom, the Ut are expected to be biased towards zero. This may be remedied 
by introducing an offset in the penalization term for the control in the functional 
A. This offset can than be treated as an additional control parameter and its 
optimal value may again be estimated by the method introduced in this article. 

Concerning the equivalence of the two approaches for calculating the sensi- 
tivity in variational data assimilation problems (as detailed in Subsections 13.11 
and !3.2l), numerica l experiments were carried out employing the Lorenz'63 sys- 
tem (jLorenzl 1963 ). These experiments will not be discussed in detail here; For 
a comprehensive report of the experiments, see iBrockerl (|201f)h . The Lorenz'63 
system is a simple dynamical system with three degrees of freedom, which ex- 
hibits chaotic motion. For systems of this size, the collocation approach leads 
to perfectly manageable problems. The sensitivity was calculated using the 
methodology of both Subsection 13.11 and 13.21 The results turned out to be in 
perfect agreement. 



5.2 Statistical interpretation of the optimal a 

It was already mentioned that there is an alternative interpretation of the vari- 
ational approach, namely as a BLUE or more generally a maximum-aposteriori 
estimator. One of the problems with this interpretation was that it required 
the weighting matrices aR and (1 — a)Q to be equal to the inverse observa- 
tional and dynamical error covariances, respectively. Given that we now have 
a methodology to find a, the question arises whether this provides us with rea- 
sonable estimates of observational and dynamical error covariances. In order 
to investigate this question further, we carried out several simulations, with 
the following setup. We again generated "reality" using the Lorenz'96 equa- 
tions (|5l|) . but replaced the term —jZi(t) by white noise gi(t) with covariance 
function E [gi{t)gj(s)\ = q5ijS(ti — t 2 ). Equation (l54l) (with L = 32 degrees of 
freedom) was then integrated using a stochastic Euler scheme. The observations 
(all degrees of freedom were observed) where created as before, and the data 
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ln[ 9 /(>Af)] 



4 6 8 

ln[q/(p At)] 



Figure 3: log is plotted vs. log , with a being the optimal value for the 
assimilation error in plot (a), resp. the out of sample error in plot (b). Various 
choices of q and po are shown. The straight line has been included to guide the 
eye and corresponds to log — log . 



assimilation machinery was applied, including determining the out-of-sample 
error and the optimal a. 

Ideally, we would like to compare a with a version of BLUE in continuous 
time and with 5-correlated disturbances. Pending a more rigorous discussion 
of such a theory, we have to make do here with the following heuristics. As 
was discussed in Section HOI we assume the observational noise process r t do 
be interpolated from samples of a process v t with short correlation and variance 
po- Since the integral over the entire power spectrum of v t must be equal to 
po, it is, strictly speaking, impossible that v t has a fiat power spectrum with 
truly unlimited bandwidth. However, we might alternatively obtain r t by low- 
pass filtering (with appropriate cutoff) a white noise process v t with correlation 
function E \v t v^\ = a-5(t—s); in order that r t has the power spectrum required in 
Equation (fTHjl . we need to set a 2 = po ■ At. We tentatively interprete the action 
integral (|22p as a finite bandwidth approximation of the logarithmic aposteriori 
of the orbit {xt,t 6 [£ s ,t/]}, for infinite bandwith observational and dynamical 
noise. (At present, we believe that the action integral f|22[) needs to be modified 
in order to survive the limit of infinite bandwidth.) Given this interpretation is 
correct, we should have 



log = log - = log — . 

1 — a a po ■ At 



(66) 



Our numerical experiments indicate that relation (1661) is indeed correct. 
In total, 15 simulations were run, with both po and q ranging between 0.01 
and 2. The simulations where collated in five groups of three each, with q/r 
constant within each group; the five groups corresponed to the ratios q/r — 
[1.5, 1, 2, 5, 10]. We then determined a s by optimising the out-of-sample error. 
Further, a op t optimal for the true assimilation error was recorded. In Figure [31 
plot (a), log 1 "^ t pt - is shown versus log pQ q At - Clearly, the two values agree al- 
most perfectly. In Figure G3 plot (b) , log 1 "^ t is shown versus log pg q At , again 
with very good agreement. A few further simulations were run with different 
At (not shown), confirming that the scaling with At is indeed as indicated 
by Equation (|66p . The conclusions we can draw from these investigations are 
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that firstly, since a s is very close to a pt > our method again provides an a which 
very nearly minimises the assimilation error. Secondly, we see that a op t is indeed 
very close to the logarithmic ratio of the noise intensities; thereby, the presented 
methodology might also be seen as a method for estimating the dynamical noise 
intensity. These results clearly call for further theoretical investigation, which 
has to be deferred to a future paper. 

From the heuristics presented above, it seems that the variational approach 
to data assimilation as studied here can indeed be regarded as a finite band- 
width approximation to a maximum-aposteriori trajectory estimator for prob- 
lems with white obse r vation al and dynamical noise. As mentioned in Section |31 



Zeitouni and Dembol (| 1987T) discuss a maximum-aposteriori concept, but their 



expression for the logarithmic aposteriori differs from our action integral by 
a term involving the divergence of the vector field / (and other terms that 
do not bear on the minimisation) . Interestingly though, the divergence of the 
Lorenz'96 system is constant; therefore in the present situation our action inte- 
gral is equivalent to the maximum-aposteriori concept of Zeitouni and Dembo, 
if relation (j6"6")l is in force. We speculate that in general, relation (j6"6")l is true 
only if the correct form of the logarithmic aposteriori is employed. 

Encouraged by these findings, we might look back at Section lSTTl and see if an 
effective white noise can be associated with the model error term —jZi despite 
the fact that there, the Zi were not stochastic but the fast degrees of freedom. 
From our numerical simulations, we estimated V&r[yZi} to be about 2.28 • 10~ 4 
and 2.016 for the small resp large noise cases considered in Section fSTTI fcf. also 
Fig. [5]). For the ratio -J*j-h^iJ we obtain 0.4453 and 0.3937; the corresponding 
values for were 60.3403 and 82.2695. Unfortunately, we were unable to 
relate the noise ratio with the corresponding a-values. Note that the variance 
ratio is not proportional to the a-ratio, so a simple rescaling with some effective 
sampling time, for example, could give a very approximate correspondence at 
best. In fact, simply scaling with 5t yielded completely wrong results. We 
conclude that for the purpose of variational data assimilation, interpreting the 
fast degrees of freedom as white noise model error can be very dangerous indeed. 
Clearly, the fast degrees of freedom differ from white noise in a large number of 
ways, but it would still be interesting to know why we see so different behaviour 
between these experiments and those with white noise perturbances. This will 
be subject to future research. 



5.3 Synchronisation 

As a second example, we have studied assimilations by means of synchronisation. 
In the particular setup we studied, the model is coupled to the observations 
through a simple linear coupling term 

±i = -x^i(x^ 2 - Xi+i) - Xi + F + K(r)i - Xi), 

where k is the coupling constant. We can expect that, if the coupling is too 
strong, the observations will be tracked too rigorously and hence observational 
noise is not filtered out well. On the other hand, if k takes to small values, the 
observations are tracked poorly and, as an additional consequence, model errors 
are not compensated for. Hence, again, we can expect the assimilation error to 
take a minimum at some nontrivial value of k. 
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Figure 4: For synchronisation, E oos (♦), At (a), 2<j 2 S (•), and Aa (■) are 

plotted vs. k, for the small noise (a) and large noise (b) cases. Observations 
where taken of all slow degrees of freedom. The vertical line marks the value of 
the coupling constant at which the assimilation error gets minimal, K op t. Note 
that the position of the minimum of E oos coincides perfectly with K op t- E oos 
and At have been shifted on the ordinate for better visibility. 



For the numerics, the setup described in Section 2] with Cjj = <5jj for a 
total time interval of length T = 2 22 • St was simulated, where St = 1CP 5 and 
observations where taken at sampling intervals At = 5.12 ■ 10 -3 . The model was 
integrated with a time step 5t mo a c \ = 2 A St — 1.6 ■ 10~ 4 . Again, the observations 
rji{t) where interpolated by means of cubic splines at intermediate times where 
no observations where recorded. 

For the simple synchronisation setup studied here, we get from Equation ([53]) 
that the sensitivity per unit time and per observed degree of freedom is given by 
ifcAi. To calculate the tracking and assimilation errors, a transient time was 
ignored to give the system time to synchronise. In Figure 0] the out-of-sample 
error (diamonds) is presented, together with the tracking error (triangles), the 
sensitivity as 2p 2 S (bullets), and the assimilation error (squares) for the syn- 
chronisation scenarios corresponding to the two cases (weak noise (a) and large 
noise (b)) with complete information of the slow degrees of freedom, for various 
choices of the coupling parameter k. Again, At and E ao!S have been shifted on 
the ordinate for better comparison. Just as in the case of the variational as- 
similation, the tracking error decreases monotonously with increasing coupling 
strength, while the sensitivity increases monotonously. Again, the linearised 
out-of-sample error shows a well defined minimum at a certain value of /c op t, 
which coincides perfectly, within the studied resolution, with the k at which the 
assimilation error is minimal. To guide the eyes, we plot a vertical line to mark 
the positions of the minima. The extremely high values of K op t in the low noise 
case can be easily understood when having in mind that, due to the the small 
observational noise, a rigorous tracking of the observed signal does not intro- 
duce large dynamical perturbations. As an interesting aside, note that although 
the tracking error decreases monotonically when increasing k, the strength of 
the dynamical perturbations, f. f dt 5Z i=1 [/«(??i — Xi)] 2 /[(tf — t s )L] = k 2 At, does 
not decrease. In fact, it displays a well defined minimum, meaning that, at this 
value of k, the influence of the coupling term gets minimal. Note that this value 
of k does not correspond to the k value at which the out-of-sample error gets 
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Figure 5: The squared coupling term, k 2 At, is plotted vs. k (•). The vertical 
line denotes the value of the coupling strength at which the assimilation error 
gets minimal, K opt . Note that the minimum of k 2 At does not coincide with 

Kopt- 
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minimal. To demonstrate this statement, we plot k 2 At, for the large noise case 
with observations at all degrees of freedom (see Fig. [SJ . The vertical line marks 
K pt, at which the assimilation error and the linearised out-of-sample error dis- 
play their minima. Note that the minimum of the coupling term lies far away 
from K pt- Rather, it appears that the minimum of k 2 At is determined by the 
value of k at which, in the perfect model case, the phase transition would be 
expected. 

The coupling term, in case of synchronisation, is in some sense analogous 
to u t in the variational assimilation case, whence k 2 A^ is analogous to Am in 
that it describes the average deviation from the pure model. In contrast to 
k 2 At though, Am is monotonously increasing with increasing sensitivity. On 
the other hand, it might be argued that the analogy between k 2 At and Am is 
warranted only far beyond the phase transition. In that range, k 2 At and Am 
show qualitatively the same behaviour. 

6 Conclusions and outlook 

When assimilating observational data into a dynamical model, then solutions 
which closely track the observations typically do not exactly adhere to the model 
equations, while enforcing the latter would cause unacceptably large deviations 
from the observations. Thus, in data assimilation one faces a fundamental 
trade-off, caused by model error. This trade-off is investigated in this paper. 

To settle the trade-off in real-world situations, a minimal out-of-sample 
error is proposed as a criterion. As was shown, the out-of-sample error is con- 
nected to the assimilation error, but can also be expressed by means of the 
tracking error and the sensitivity of the assimilation scheme. The sensitivity 
quantifies by how much the output of the data assimilation depends on per- 
turbations of the observed time series. The tracking error is available under 
operational circumstances. It is demonstrated that also the sensitivity can, at 
least in principle, be estimated operationally. 

The details depend on the specifics of the employed data assimilation al- 
gorithm. In this paper, both variational data assimilation as well as synchro- 
nisation (aka nudging) are looked at. It is stressed that when calculating the 
sensitivity, several quantities can (and should) be recycled which are available 
already from the data assimilation algorithm proper, thereby saving compu- 
tational resources. The feasibility of the proposed schemes is demonstrated 
through numerical experiments involving the Lorenz'96 system. 

The variational approach studied here might be interpreted as a (contin- 
uous time nonlinear) analogy of the best linear unbiased estimator (BLUE). 
This interpretation comes with a specific recommendation as to how to set the 
sensitivity parameter, namely as the ratio between the dynamical and observa- 
tional error covariances. This recommendation was tested against the method 
investigated here; we found that if the assumptions behind BLUE apply, both 
methods gave essentially the same results (yet ours needs less information to be 
applicable). For model error consisting of fast dynamical variables though, we 
found BLUE to perform very poorly. 

In the future, it would be desirable to apply the presented method to realistic 
weather models. Such models typically include various inequivalent degrees of 
freedom, necessitating more than just one sensitivity parameter. In principle, 
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the approach is straightforwardly applicable to the case of multiple sensitivity 
parameters, although exploring a larger dimensional parameter space will of 
course increase the numerical costs considerably. 

In order to keep things concise, simple quadratic forms for the penalisation 
terms are used in this article. Furthermore, simple additive observational errors 
and dynamical perturbations are chosen. In realistic situations, more involved 
choices might be appropriate. However, for the most part, the discussion does 
not rely on these choices, but readily applies to more complicated cases. In 
any event, the a priori knowledge of the observational noise characteristics is 
crucial for good parameter estimations. It would therefore be of interest to study 
in more detail the impact of wrong assumptions concerning the observational 
noise on the obtained solutions. Furthermore, in more realistic setups, the 
observational noise might display correlations, in which case the methodology 
still applies, but requires more involved calculations. 

The proposed procedure should not only be applicable to determining opti- 
mal sensitivity parameters but also to obtain better estimations of model param- 
eters. Again, the advantage of the proposed procedure would be that parameters 
are assessed by means not of the (in sample) tracking error but rather of the 
assimilation error, which is a better performance measure for data assimilation 
systems. 

Finally, a general discussion as to which measure for assimilation perfor- 
mance is appropriate to determine the sensitivity parameter would be desirable. 
While the minimisation of the assimilation error appears to be a sensible choice 
for the assessment of trajectories obtaine d by assimilation, ther e have been other 



suggestions in the past. For example, in lSzendro et al.l (|2009ft it has been pro- 



posed to assess the quality of an assimilated solution taking into account the 
spatial structure of the assimilation error. The reason is the observation that 
even if the assimilation error is small, the reconstructed trajectory does not 
necessarily correspond to a typical trajectory of the reality, and might therefore 
yield poor predictions. Nonetheless, numerical experiments (not shown here) 
indicate that sensitivity parameters optimal with respect to the assimilation 
error do not coincide with s ensiti vity parameters according to the methodology 
proposed in lSzendro et al. I (l2009h . 
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Appendix: Structure of the collocation equations 
for the BVP solver bvp4c 

In this appendix, we will be assuming that the BVP is given as 

z t = F(t,z t ), te[t s ,tf], b(z ta ,z tf ) = 0, 

with z t £ M. 2D , F a vector field on M. 2D and &(., ..) a function representing the 
boundary conditions. Suppose t s = to < ... < tjy = tf are temporal mesh 
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points; write Zi := z ti and z := (zq, . . . , 2jv) for the values of the solution at 
these points. The collocation equations 

$ i (z) = Q, i = 0...N, 

are so lved to obtain z. The BVP solver bvp4c by iKierzenka and Shampine 
(|200lh implemented in Matlab uses a fourth order implicit Runge-Kutta scheme 
to approximate the differential relation, in which case 

for i = 1 . . . N with hi = t l+1 - t t , t, t _ 1/2 = U-i + ^f 1 , and 
F% F(t{ , z^) , 

= F ^ (Fi - Fi-i) 

Finally, $0(2) = b(zo,ZN) represents the boundary conditions. Using these 
expressions, we get the following partial derivatives of $: 

since the boundary conditions do not depend on F, and furthermore 

<9$, 1 , 1 



-zhi-i + 7^-1/2/^-1 



6 12 

2 

= --/lj-1 



0*1-1/2 3 

<"K 1, 1 



for i = 1 . . . N. where 



Ji-l/2 

OF ( hi-x Zi-i + Zi h-i \ 
= ^z-{ U - 1 + —— r (F^F^)j. (67) 

The algorithm bvp4c provides a numerical approximation to this quantity. 
These relations as well as the inverse of the collocation Jacobian (also available 
from bvp4c) can be used to compute the sensitivity through Equation (I3"6l) . 
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